Beyond Term Indexing: A P2P Framework for Web Information Retrieval

نویسندگان

  • Ivana Podnar Zarko
  • Martin Rajman
  • Toan Luu
  • Fabius Klemm
  • Karl Aberer
چکیده

Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, storage) consumption. The paper presents a framework for full-text information retrieval in structured P2P networks and introduces a novel retrieval model based on highly discriminative keys—terms and term sets appearing in a restricted number of documents—that ensure efficient and scalable retrieval. Our goal is to design scalable techniques for building a global key index in structured P2P overlays for large document collections. We present experimental results that show acceptable indexing and retrieval costs while the retrieval quality is comparable to standard centralized solutions with BM25 relevance computation scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Highly Discriminative Keys for Indexing in a Peer-to-Peer Full-Text Retrieval System

Excessive network bandwidth consumption, caused by the transmission of long posting lists, was identified as one of the major bottlenecks for implementing distributed full-text retrieval in a Peer-toPeer (P2P) architecture. To address this problem we introduce a novel approach to indexing using highly discriminative terms and term sets, which leads to short posting lists and therefore reduces t...

متن کامل

Building a peer-to-peer full-text Web search engine with highly discriminative keys

Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low maintenance costs are favorable properties of P2P over...

متن کامل

Bridging the P2P and WWW Divide with DISCOVIR - DIStributed COntent-based Visual Information Retrieval

In the light of image retrieval evolving from text annotation to content-based and from standalone applications to web-based search engines, we foresee the need for deploying content-based image retrieval (CBIR) into Peer-to-Peer (P2P) architecture. By doing so, we not only distribute the tasks of feature extraction, indexing and storage of image data into peers, we also introduce another aspec...

متن کامل

Text Based Approaches for Content Based Image Retrieval in a P2P Network

The tremendous growth of digital multimedia content on the web requires scalable, efficient, and effective information retrieval mechanisms. Handling such large collections of data in a centralized way requires costly high bandwidth connectivity and powerful servers. This establishes the need of distributed architectures, such as peer-to-peer systems, that allow sharing of data management and s...

متن کامل

A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval

The exponential growth of data demands scalable and adaptable infrastructures for indexing and searching a huge amount of data sources with high accuracy and efficiency. Existing centralized search engines are not scalable and suffer from single-point-offailures. The recent work on P2P index construction partitions the document vectors either randomly or statically, making it difficult to trade...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Informatica (Slovenia)

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2006